A Data-Driven Approach Using the NFL Big Data Bowl Dataset and Advanced Machine Learning Techniques
Rows: 393,536
Technique: Group Splitting
Factors to Consider:
- Tackle (0/1)
- Future X/Y
- S/A/O/Dir of defender
- Position / Alignment Cluster
- Number of Defenders in the Box
- Current and future (.5 seconds) location of the ball
- O/S/A/Dir of ball carrier
- Velocity/direction difference
- Ball in defensive players ‘fan’
Models:
- Penalized Regression (Train: 8019; Test: 1738)
- Random Forest (Train: 8019; Test: 1738)
- XGBoost (Train: 8019; Test: 1738)
- Neural Network (Train 231,110; Test: 90,090)
Baseline Accuracy:
- Non-Neural Net: 92.85%
- Neural Net: 92.94%
The best parameters are: Lambda = 0.00853 and Alpha = 2.21^{-5} with an accuracy of 91.14%
The best parameters are: Mtry = 6, Min_n = 6, and Trees = 1848 with an accuracy of 92.86%.
The best parameters are: Trees = 652, Min_n = 8, tree_depth = 2, Learn Rate = 1.1, and Loss Reduction = 2.9078823^{9} with an accuracy of 92.86%.
\(\sum_{i=1}^{N} (\mathbb{I}_{\text{tackle}_i} - P(\text{tackle}_i))\)
Where: